Contains information for each fulfillment center
| center_id | city_code | region_code | center_type | op_area |
|---|---|---|---|---|
| 11 | 679 | 56 | TYPE_A | 3.7 |
| 13 | 590 | 56 | TYPE_B | 6.7 |
| 124 | 590 | 56 | TYPE_C | 4 |
| 66 | 648 | 34 | TYPE_A | 4.1 |
| 94 | 632 | 34 | TYPE_C | 3.6 |
| 64 | 553 | 77 | TYPE_A | 4.4 |
Columns:
| name | description |
|---|---|
| center_id | Unique ID for fulfillment center |
| city_code | Unique code for city |
| region_code | Unique code for region |
| center_type | Anonymized center type |
| op_area | Area of operation (in km^2) |
Contains information for each meal being served
| meal_id | category | cuisine |
|---|---|---|
| 1885 | Beverages | Thai |
| 1993 | Beverages | Thai |
| 2539 | Beverages | Thai |
| 1248 | Beverages | Indian |
| 2631 | Beverages | Indian |
| 1311 | Extras | Thai |
Columns:
| name | description |
|---|---|
| meal_id | Unique ID for the meal |
| category | Type of meal (beverages/snacks/soup/…) |
| cuisine | Meal cuisine (Indian/Italian/…) |
Data for testing the model. Similar structure as train.csv with the target variable
| id | week | center_id | meal_id | checkout_price | base_price | emailer_for_promotion | homepage_featured |
|---|---|---|---|---|---|---|---|
| 1028232 | 146 | 55 | 1885 | 158.11 | 159.11 | 0 | 0 |
| 1127204 | 146 | 55 | 1993 | 160.11 | 159.11 | 0 | 0 |
| 1212707 | 146 | 55 | 2539 | 157.14 | 159.14 | 0 | 0 |
| 1082698 | 146 | 55 | 2631 | 162.02 | 162.02 | 0 | 0 |
| 1400926 | 146 | 55 | 1248 | 163.93 | 163.93 | 0 | 0 |
| 1284113 | 146 | 55 | 1778 | 190.15 | 190.15 | 0 | 0 |
Columns:
| name | description |
|---|---|
| id | Unique ID |
| week | Week No |
| center_id | Unique ID for fulfillment center |
| meal_id | Unique ID for Meal |
| checkout_price | Final price including discount, taxes & delivery charges |
| base_price | Base price of the meal |
| emailer_for_promotion | Emailer sent for promotion of meal |
| homepage_featured | Meal featured at homepage |
Contains the historical demand data for all centers, test.csv contains all the following features except the target variable
| id | week | center_id | meal_id | checkout_price | base_price | emailer_for_promotion | homepage_featured | num_orders |
|---|---|---|---|---|---|---|---|---|
| 1379560 | 1 | 55 | 1885 | 136.83 | 152.29 | 0 | 0 | 177 |
| 1466964 | 1 | 55 | 1993 | 136.83 | 135.83 | 0 | 0 | 270 |
| 1346989 | 1 | 55 | 2539 | 134.86 | 135.86 | 0 | 0 | 189 |
| 1338232 | 1 | 55 | 2139 | 339.5 | 437.53 | 0 | 0 | 54 |
| 1448490 | 1 | 55 | 2631 | 243.5 | 242.5 | 0 | 0 | 40 |
| 1270037 | 1 | 55 | 1248 | 251.23 | 252.23 | 0 | 0 | 28 |
Columns:
| name | description |
|---|---|
| id | Unique ID |
| week | Week No |
| center_id | Unique ID for fulfillment center |
| meal_id | Unique ID for Meal |
| checkout_price | Final price including discount, taxes & delivery charges |
| base_price | Base price of the meal |
| emailer_for_promotion | Emailer sent for promotion of meal |
| homepage_featured | Meal featured at homepage |
| num_orders | Orders Count (Target) |
Unimos los datasets de centros y de comidas con el de entrenamiento.
| id | week | center_id | meal_id | checkout_price | base_price | emailer_for_promotion | homepage_featured | num_orders | city_code | region_code | center_type | op_area | category | cuisine |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1379560 | 1 | 55 | 1885 | 136.83 | 152.29 | 0 | 0 | 177 | 647 | 56 | TYPE_C | 2 | Beverages | Thai |
| 1466964 | 1 | 55 | 1993 | 136.83 | 135.83 | 0 | 0 | 270 | 647 | 56 | TYPE_C | 2 | Beverages | Thai |
| 1346989 | 1 | 55 | 2539 | 134.86 | 135.86 | 0 | 0 | 189 | 647 | 56 | TYPE_C | 2 | Beverages | Thai |
| 1338232 | 1 | 55 | 2139 | 339.5 | 437.53 | 0 | 0 | 54 | 647 | 56 | TYPE_C | 2 | Beverages | Indian |
| 1448490 | 1 | 55 | 2631 | 243.5 | 242.5 | 0 | 0 | 40 | 647 | 56 | TYPE_C | 2 | Beverages | Indian |
| 1270037 | 1 | 55 | 1248 | 251.23 | 252.23 | 0 | 0 | 28 | 647 | 56 | TYPE_C | 2 | Beverages | Indian |
Este dataset se compone de 15 dimensiones y de 456548 observaciones. Por otro lado, se observan los tipos de las variables:
## 'data.frame': 456548 obs. of 15 variables:
## $ id : int 1379560 1466964 1346989 1338232 1448490 1270037 1191377 1499955 1025244 1054194 ...
## $ week : int 1 1 1 1 1 1 1 1 1 1 ...
## $ center_id : int 55 55 55 55 55 55 55 55 55 55 ...
## $ meal_id : int 1885 1993 2539 2139 2631 1248 1778 1062 2707 1207 ...
## $ checkout_price : chr "136.83" "136.83" "134.86" "339.5" ...
## $ base_price : chr "152.29" "135.83" "135.86" "437.53" ...
## $ emailer_for_promotion: int 0 0 0 0 0 0 0 0 0 0 ...
## $ homepage_featured : int 0 0 0 0 0 0 0 0 0 1 ...
## $ num_orders : int 177 270 189 54 40 28 190 391 472 676 ...
## $ city_code : int 647 647 647 647 647 647 647 647 647 647 ...
## $ region_code : int 56 56 56 56 56 56 56 56 56 56 ...
## $ center_type : chr "TYPE_C" "TYPE_C" "TYPE_C" "TYPE_C" ...
## $ op_area : chr "2" "2" "2" "2" ...
## $ category : chr "Beverages" "Beverages" "Beverages" "Beverages" ...
## $ cuisine : chr "Thai" "Thai" "Thai" "Indian" ...
Se puede observar que las variables checkout_price, base_price, op_area las detecta como char por lo que hay que convertirlas a un número real.
Para entender cada dimensión se muestran los descriptivos de cada variable.
| id | week | center_id | meal_id | checkout_price | base_price | emailer_for_promotion | homepage_featured | num_orders | city_code | region_code | center_type | op_area | category | cuisine | date | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| nbr.val | 4.565480e+05 | 4.565480e+05 | 4.565480e+05 | 4.565480e+05 | 4.565480e+05 | 4.565480e+05 | 4.565480e+05 | 4.565480e+05 | 4.565480e+05 | 4.565480e+05 | 4.565480e+05 | NA | 4.565480e+05 | NA | NA | 4.565480e+05 |
| nbr.null | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 4.194980e+05 | 4.066930e+05 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | NA | 0.000000e+00 | NA | NA | 0.000000e+00 |
| nbr.na | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | NA | 0.000000e+00 | NA | NA | 0.000000e+00 |
| min | 1.000000e+06 | 1.000000e+00 | 1.000000e+01 | 1.062000e+03 | 2.970000e+00 | 5.535000e+01 | 0.000000e+00 | 0.000000e+00 | 1.300000e+01 | 4.560000e+02 | 2.300000e+01 | NA | 9.000000e-01 | NA | NA | 1.717400e+04 |
| max | 1.499999e+06 | 1.450000e+02 | 1.860000e+02 | 2.956000e+03 | 8.662700e+02 | 8.662700e+02 | 1.000000e+00 | 1.000000e+00 | 2.429900e+04 | 7.130000e+02 | 9.300000e+01 | NA | 7.000000e+00 | NA | NA | 1.818200e+04 |
| range | 4.999990e+05 | 1.440000e+02 | 1.760000e+02 | 1.894000e+03 | 8.633000e+02 | 8.109200e+02 | 1.000000e+00 | 1.000000e+00 | 2.428600e+04 | 2.570000e+02 | 7.000000e+01 | NA | 6.100000e+00 | NA | NA | 1.008000e+03 |
| sum | 5.707290e+11 | 3.413553e+07 | 3.748524e+07 | 9.242072e+08 | 1.516830e+08 | 1.616895e+08 | 3.705000e+04 | 4.985500e+04 | 1.195575e+08 | 2.746380e+08 | 2.584727e+07 | NA | 1.864355e+06 | NA | NA | 8.076508e+09 |
| median | 1.250184e+06 | 7.600000e+01 | 7.600000e+01 | 1.993000e+03 | 2.968200e+02 | 3.104600e+02 | 0.000000e+00 | 0.000000e+00 | 1.360000e+02 | 5.960000e+02 | 5.600000e+01 | NA | 4.000000e+00 | NA | NA | 1.769900e+04 |
| mean | 1.250096e+06 | 7.476877e+01 | 8.210580e+01 | 2.024337e+03 | 3.322389e+02 | 3.541566e+02 | 8.115250e-02 | 1.091999e-01 | 2.618728e+02 | 6.015534e+02 | 5.661457e+01 | NA | 4.083590e+00 | NA | NA | 1.769038e+04 |
| SE.mean | 2.136427e+02 | 6.145620e-02 | 6.804230e-02 | 8.101738e-01 | 2.263482e-01 | 2.378568e-01 | 4.041000e-04 | 4.616000e-04 | 5.859591e-01 | 9.796880e-02 | 2.610880e-02 | NA | 1.615700e-03 | NA | NA | 4.301937e-01 |
| CI.mean.0.95 | 4.187331e+02 | 1.204523e-01 | 1.333608e-01 | 1.587916e+00 | 4.436355e-01 | 4.661921e-01 | 7.921000e-04 | 9.047000e-04 | 1.148462e+00 | 1.920159e-01 | 5.117250e-02 | NA | 3.166700e-03 | NA | NA | 8.431663e-01 |
| var | 2.083831e+10 | 1.724322e+03 | 2.113705e+03 | 2.996697e+05 | 2.339056e+04 | 2.582961e+04 | 7.456690e-02 | 9.727550e-02 | 1.567549e+05 | 4.381899e+03 | 3.112157e+02 | NA | 1.191779e+00 | NA | NA | 8.449178e+04 |
| std.dev | 1.443548e+05 | 4.152496e+01 | 4.597505e+01 | 5.474209e+02 | 1.529397e+02 | 1.607159e+02 | 2.730694e-01 | 3.118902e-01 | 3.959228e+02 | 6.619591e+01 | 1.764131e+01 | NA | 1.091686e+00 | NA | NA | 2.906747e+02 |
| coef.var | 1.154750e-01 | 5.553783e-01 | 5.599488e-01 | 2.704198e-01 | 4.603305e-01 | 4.537990e-01 | 3.364893e+00 | 2.856140e+00 | 1.511890e+00 | 1.100416e-01 | 3.116037e-01 | NA | 2.673350e-01 | NA | NA | 1.643120e-02 |
| nb_centers | nb_cities | nb_regions | nb_center_type |
|---|---|---|---|
| 77 | 51 | 8 | 3 |
Se puede observar que en la ciudad 590 tiene el máximo de centros, que son 9 y que tan sólo 12 ciudades tienen más de 1 centro.
La región 56 contiene el máximo de centros con 30 lo que supondría una densidad de 1.875 centros por región.
De los 3 tipologías de centro, la mayoritaria con 30 sería la TYPE_A
La distribucción de la tipología por región se puede observar en el siguiente gráfico, donde se aprecia que las regiones c(23, 35, 71, 93) tan sólo tienen centros de tipología TYPE_A
| nb_meals | nb_categories | nb_cuisines |
|---|---|---|
| 51 | 14 | 4 |
Se puede observar que del total de alimentos se encuentran repartidos en cada una de las categorías donde la mayoritaria sería Beverages con 12 alimentos, mientras que el resto tan sólo poseen 3.
Se puede observar que aunque la cocina mayoritaria es la Thai, todos los tipos de cocina tienen una categoría común que son las Beverages.